Goto

Collaborating Authors

 pl condition



On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen

Neural Information Processing Systems

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings.



A Generalized Alternating Method for Bilevel

Neural Information Processing Systems

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting.


our technical contributions and the importance of the work

Neural Information Processing Systems

We thank the reviewers for their constructive comments. We provide the first convergence result for alternating GDA for more general problems. Unlike simultaneous GDA, our alternating GDA uses different learning rates for the primal and dual variables. PL function and introducing a balancing parameter to establish the contraction. Significant advance has been recognized by utilizing the PL conditions in these field.






A Preliminary

Neural Information Processing Systems

We first introduce necessary notations as follows. "LB" is lower bound while "UB" is upper bound. Quantity µ is the PL constant. These rates are derived under the strongly-convex assumption, not the general PL condition. This rate is achieved by utilizing increasing (non-constant) mini-batch sizes.